Knee Replacement Surgery Outcome

Introduction

This research aims to determine if the postoperative function of a total knee replacement recipient can be modeled based on anthropomorphic data, and details of the surgical procedure. We decided to carry out testing on models with two postoperative scoring systems as the response variable. The first score is the “Surgeon’s Score” which measures more objective postoperative metrics about the condition of the replaced knee following surgery. These metrics include: how straight or how flexed the patient can make their knee, how well aligned the knee appears when inspecting from a frontal perspective, and how much laxity or separation can be achieved by placing the lower leg under tension. Finally, the patient is asked about the overall pain they experience with the knee. The resulting score is a value between 0-100 with 80-100 being an excellent result.

The second score we modeled is the “Patient’s Score” which is generated by three questions: how far are they able to walk, how well can they navigate stairs, and do they require any walking aids (crutches, cane, etc).

We hope to create a model that can predict each of the scores based on a variety of categorical and numeric predictors.

Background information

This data has been collected from a Southern California Hospital in collaboration with the Shiley Center for Orthopedic Research. The data is used for both clinical research and postoperative monitoring of implant survivorship. All patient specific information has been removed from the data so it adheres to the Health Insurance Portability and Accountability Act (HIPAA).

Overview of the Work Flow

We formed our group by advertising on the course forum site and Slack chat. After the group had been formed, the first step was to decide on a topic and a data set for the selected topic. Next, after our proposal had been approved, we cleaned up the data and started working towards finding a final model that is most suitable for our data set. To reach the final model, each of us, worked independently to come up with a list of models. Models were presented and discussed during our regular meetings via Zoom where ideas were exchanged. Eventaully, we converged and decided on a final model that has been used for the work presented here.

Statement

We hope to determine the impact a particular surgeon or the type of implants used has on the patient’s outcome after surgery. We are also interested in exploring the potential relationship between anthropomorphic factors and the patient’s postoperative satisfaction. Finally, we would like to understand if the considerable expense of computer navigation in knee replacement leads to better postoperative scores. Nevertheless, we keep our options opened to investigate other aspects that were not ‘visible’ or thought about but might reveal themselves as we move on with the project.

Here we may consider KneeScore_Surgeon or KneeScore_Patient as response and Age, Gender, Weight, Height, BMI, Race, Side, Manufacturer as predictors. Gender, Race, Side and Manufacturer are some of the categorical predictors and Age, Height, Weight and BMI are numerical predictors.

knee = read.csv('knees.csv')
knee$Surgeon = as.factor(knee$Surgeon)
knee$Year = as.factor(knee$Year)

Functions for Model Evaluation

#Function to plot various plots as QQPlot, Fitted vs Residual Plot.
showPlots = function(model) {
  par(mfcol=c(1,3))
  qqnorm(resid(model), col = red, pch = 16)
  qqline(resid(model), col = lightblue, lwd = 2)
  plot(fitted(model), resid(model), xlab = 'Fitted Values', ylab = 'Residuals', main = 'Fitted vs. Residuals Plot', col = purple, pch = 16)
  abline(h = 0, col = green, lwd = 2)
  hist(resid(model), col = blue, main = 'Histogram of Residuals', xlab = 'Residuals')
}

#Function to calculate various statistics which we use to evaluate the model
evaluate = function(model){
  # Run each of the tests
  lambda = 1.000001 # this is a hack so the function will return a CrossValidation score - HatValues of 1.0 are causing a division by zero error.
  shapiroTest = shapiro.test(resid(model))
  bpTest = bptest(model)
  rSquared = summary(model)$r.squared
  adjRSquared = summary(model)$adj.r.squared
  loocv = sqrt(mean((resid(model) / (lambda - hatvalues(model))) ^ 2))
  large_hat = sum(hatvalues(model) > 2 * mean(hatvalues(model)))
  large_resid = sum(rstandard(model)[abs(rstandard(model)) > 2]) / (length(rstandard(model) + lambda))
  large_cooks = sum(cooks.distance(model) > 4 / length(cooks.distance(model)))
  
  # Collect tests in dataframe
  values = data.frame(Result = c(prettyNum(shapiroTest$p.value), prettyNum(bpTest$p.value), prettyNum(rSquared), prettyNum(adjRSquared),prettyNum(loocv), prettyNum(large_hat), prettyNum(large_cooks))) 
  rowNames = data.frame(Test = c('Shapiro Wilk pvalue', 'Breusch-Pagan pvalue', 'R Squared', 'Adj R Squared', 'Cross Validation', 'Large Hat Values', 'Influential'))

  summary_output = cbind(rowNames, values)
  show(summary_output)
  showPlots(model)
}
# find and remove all influential values and return the resulting dataframe
removeInfluential = function(model, data){
  cooks = which(cooks.distance(model) > 4 / length(cooks.distance(model)))
  newData = data[-cooks , ]
  return(newData)
}

# identify all influential data points
IdentifyInfluential = function(model, data){
  cooks = which(cooks.distance(model) > 4 / length(cooks.distance(model)))
  return(index_)
}

# find large hat values, subtract them from the dataframe and return the dataframe
removeLargeHatValues = function(model, data){
  largeHat = which(hatvalues(model) > 2 * mean(hatvalues(model)))
  newData = data[-largeHat, ]
  return(newData)
}

# calculate RMSE
rmse  = function(actual, predicted) {  sqrt(mean((actual - predicted) ^ 2))}

Knee Score Surgeon

Before moving onto the formal analysis of our data set, we started by performing visual inspection by plotting dependencies among our the variables. The main focus was to look for any visual relationship between KneeScore_Surgeon, KneeScore_Patient and the rest of the variables. Our choice was to use scatterplotMatrix from the car package as it plots regression lines along with the actual data.

scatterplotMatrix(~ KneeScore_Surgeon + KneeScore_Patient + Surgeon + Year + Age + Gender + Weight + Height, span=0.7, data=knee)

scatterplotMatrix(~ KneeScore_Surgeon + KneeScore_Patient + BMI + Diagnosis + Race + Side + GPS + Manufacturer, span=0.7, data=knee)

scatterplotMatrix(~ KneeScore_Surgeon + KneeScore_Patient + FemoralComponentModel + FemoralComponentSize + FemoralComponentType + TibialTrayModel + TibialInsertType, span=0.7, data=knee)

scatterplotMatrix(~ KneeScore_Surgeon + KneeScore_Patient + TibialTraySize + TibialInsertModel + TibialInsertWidth + PatellaModel + PatellaDiameter, span=0.7, data=knee)

Comments: Based on the above graphics, KneeScore_Surgeon, KneeScore_Patient seems to behave similarly across all variables. This is expected as both, surgeon and patient, should be in consensus in regards to the success or quality of a procedure.

For a better view of the above statement, we plotted the following plot, showing a strong relationship between KneeScore_Surgeon and KneeScore_Patient.

scatterplotMatrix(~ KneeScore_Surgeon + KneeScore_Patient, span=0.7, data=knee)

The distribution of KneeScore_Surgeon which reflects the score of a procedure assigned be the Surgeon is unimodal distribution versus a multimodal distribution of KneeScore_Patient, which is the score assigned by Patient. Our explanation of this is that Surgeon assigns any number between 1 and 100 without any discretion whereas Patient has a tendency to round up the score to the nearest fifth (Ex: 5, 10, 35, 50, etc) which creates the multi-picks of the distribution.

Begin by fitting a full additive model as a baseline and evaluating.

sample_idx = sample(1:nrow(knee), 1500)
train_data = knee[sample_idx, ]
test_data = knee[-sample_idx, ]
surgeon_full_additive_model = lm(KneeScore_Surgeon ~ ., data = train_data)
evaluate(surgeon_full_additive_model)
##                   Test       Result
## 1  Shapiro Wilk pvalue 3.969607e-06
## 2 Breusch-Pagan pvalue    0.6677602
## 3            R Squared    0.2420614
## 4        Adj R Squared    0.1930753
## 5     Cross Validation     14.70631
## 6     Large Hat Values          166
## 7          Influential           NA
showPlots(surgeon_full_additive_model)

The baseline model has many large hat values. Let’s try to understand why.

largeHat = which(hatvalues(surgeon_full_additive_model) > (2 * mean(hatvalues(surgeon_full_additive_model))))
print(knee[689, ])
##     Surgeon Year Age Gender Weight Height   BMI      Diagnosis  Race  Side
## 689       4 2012  62 Female    200     63 35.46 Osteoarthritis White Right
##     KneeScore_Surgeon KneeScore_Patient GPS Manufacturer
## 689                42                30  No      Stryker
##     FemoralComponentModel FemoralComponentSize FemoralComponentType
## 689             Triathlon                    4   cruciate retaining
##     TibialTrayModel TibialTraySize TibialInsertModel TibialInsertWidth
## 689       Triathlon              4         Triathlon                11
##     TibialInsertType PatellaModel PatellaDiameter
## 689        high flex    Triathlon              32

After inspection of the rows that represent the large hat values it seems that there are a preponderance of values that are either errant entries or valid, but rare entries. For example, the Race predictor has only two entries for Mid-East Arabian. First we circle back on the original dataset and clean the identified errant entries.

print(knee[612, ])
##     Surgeon Year Age Gender Weight Height  BMI      Diagnosis     Race
## 612       3 2012  41 Female    182     66 29.4 Osteoarthritis Hispanic
##          Side KneeScore_Surgeon KneeScore_Patient GPS Manufacturer
## 612 Bilateral                39                80 Yes  DePuy (J&J)
##     FemoralComponentModel FemoralComponentSize FemoralComponentType
## 612                 Sigma                   4N   cruciate retaining
##     TibialTrayModel TibialTraySize TibialInsertModel TibialInsertWidth
## 612       PFC Sigma              3         PFC Sigma                 8
##         TibialInsertType            PatellaModel PatellaDiameter
## 612 curved,Fixed bearing Oval Dome Patella 3-peg              35
print(knee[1496, ])
##      Surgeon Year Age Gender Weight Height   BMI      Diagnosis  Race
## 1496       4 2013  66 Female    158     64 27.14 Osteoarthritis White
##       Side KneeScore_Surgeon KneeScore_Patient GPS   Manufacturer
## 1496 Right                74                50  No Smith & Nephew
##      FemoralComponentModel FemoralComponentSize FemoralComponentType
## 1496                Legion                   6N   cruciate retaining
##      TibialTrayModel TibialTraySize TibialInsertModel TibialInsertWidth
## 1496      Genesis II              5        Genesis II                 9
##      TibialInsertType PatellaModel PatellaDiameter
## 1496        high flex   Genesis II              29

Refitting the model on the cleaned data only shows a marginal impact on the model so we move on to some other methods.

Next we will try to determine which predictors are significant by comparing the full model to models each with a single predictor removed. Evaluation of this code is set to false for convienience. Comments show the significance of each test.

surgeon_additive_model = lm(KneeScore_Surgeon ~ .-Surgeon, data = train_data)
Gender_additive_model = lm(KneeScore_Surgeon ~ .-Gender, data = train_data)
Race_additive_model = lm(KneeScore_Surgeon ~ .- Race, data = train_data)
Year_additive_model = lm(KneeScore_Surgeon ~ .- Year, data = train_data)
Age_additive_model = lm(KneeScore_Surgeon ~ .- Age, data = train_data)
Weight_additive_model = lm(KneeScore_Surgeon ~ .- Weight, data = train_data)
Height_additive_model = lm(KneeScore_Surgeon ~ .- Height, data = train_data)
BMI_additive_model = lm(KneeScore_Surgeon ~ .- BMI, data = train_data)
Diagnosis_additive_model = lm(KneeScore_Surgeon ~ .- Diagnosis, data = train_data)
Side_additive_model = lm(KneeScore_Surgeon ~ .- Side, data = train_data)
KneeScore_Patient_additive_model = lm(KneeScore_Surgeon ~ .- KneeScore_Patient, data = train_data)
GPS_additive_model = lm(KneeScore_Surgeon ~ .- GPS, data = train_data)
Manufacturer_additive_model = lm(KneeScore_Surgeon ~ .- Manufacturer, data = train_data)
FemoralComponentModel_additive_model = lm(KneeScore_Surgeon ~ .- FemoralComponentModel, data = train_data)
FemoralComponentSize_additive_model = lm(KneeScore_Surgeon ~ .- FemoralComponentSize, data = train_data)
FemoralComponentType_additive_model = lm(KneeScore_Surgeon ~ .- FemoralComponentType, data = train_data)
TibialTrayModel_additive_model = lm(KneeScore_Surgeon ~ .- TibialTrayModel, data = train_data)
TibialTraySize_additive_model = lm(KneeScore_Surgeon ~ .- TibialTraySize, data = train_data)
TibialInsertModel_additive_model = lm(KneeScore_Surgeon ~ .- TibialInsertModel, data = train_data)
TibialInsertWidth_additive_model = lm(KneeScore_Surgeon ~ .- TibialInsertWidth, data = train_data)
TibialInsertType_additive_model = lm(KneeScore_Surgeon ~ .- TibialInsertType, data = train_data)
PatellaModel_additive_model = lm(KneeScore_Surgeon ~ .- PatellaModel, data = train_data)
PatellaDiameter_additive_model = lm(KneeScore_Surgeon ~ .- PatellaDiameter, data = train_data)

anova(surgeon_additive_model,surgeon_full_additive_model) ## Not significant Pr(<F) 0.2061
anova(Gender_additive_model,surgeon_full_additive_model) ## Not significant Pr(<F) 0.3678
anova(Race_additive_model,surgeon_full_additive_model) ## Not significant Pr(<F) 0.3949 
anova(Year_additive_model,surgeon_full_additive_model) ## * Some significance Pr(<F) 0.039
anova(Age_additive_model,surgeon_full_additive_model) ## *** Significant Pr(<F) 1.191e-11
anova(Weight_additive_model,surgeon_full_additive_model) ## Not significant Pr(<F) 0.4295
anova(Height_additive_model,surgeon_full_additive_model) ## Not significant Pr(<F) 0.6232
anova(BMI_additive_model,surgeon_full_additive_model) ## Not significant Pr(<F) 0.4046
anova(Diagnosis_additive_model,surgeon_full_additive_model) ## Marginally Significant Pr(<F) 0.067
anova(Side_additive_model,surgeon_full_additive_model) ## *** Significant Pr(<F) 2.2e-16
anova(KneeScore_Patient_additive_model,surgeon_full_additive_model) ## *** Significant Pr(<F) 2.2e-16
anova(GPS_additive_model,surgeon_full_additive_model) ## Not significant Pr(<F) 0.4046
anova(Manufacturer_additive_model,surgeon_full_additive_model) ## Not significant Pr(<F) 0.7153
anova(FemoralComponentModel_additive_model,surgeon_full_additive_model) ## Not significant Pr(<F) 0.1388
anova(FemoralComponentSize_additive_model,surgeon_full_additive_model) ## Not significant Pr(<F) 0.3345
anova(FemoralComponentType_additive_model,surgeon_full_additive_model) ## Not significant Pr(<F) 0.5606
anova(TibialTrayModel_additive_model,surgeon_full_additive_model) ## Not significant Pr(<F) 0.8707
anova(TibialTraySize_additive_model,surgeon_full_additive_model) ## Not significant Pr(<F) 0.1527
anova(TibialInsertModel_additive_model,surgeon_full_additive_model) ## * Some significance Pr(<F) 0.043
anova(TibialInsertWidth_additive_model,surgeon_full_additive_model) ## Not significant Pr(<F) 0.721
anova(TibialInsertType_additive_model,surgeon_full_additive_model) ## Not significant Pr(<F) 0.613

Since there are only a few predictors selected as significant, we can create a smaller model based on those.

surgeon_selective_model = lm(KneeScore_Surgeon ~ Age + Year + Diagnosis + Side + KneeScore_Patient + TibialInsertModel, data = train_data)
evaluate(surgeon_selective_model)
##                   Test       Result
## 1  Shapiro Wilk pvalue 3.789549e-06
## 2 Breusch-Pagan pvalue  0.002675905
## 3            R Squared    0.2110465
## 4        Adj R Squared    0.1954821
## 5     Cross Validation     14.38481
## 6     Large Hat Values          156
## 7          Influential           75

anova(surgeon_selective_model, surgeon_full_additive_model)
## Analysis of Variance Table
## 
## Model 1: KneeScore_Surgeon ~ Age + Year + Diagnosis + Side + KneeScore_Patient + 
##     TibialInsertModel
## Model 2: KneeScore_Surgeon ~ Surgeon + Year + Age + Gender + Weight + 
##     Height + BMI + Diagnosis + Race + Side + KneeScore_Patient + 
##     GPS + Manufacturer + FemoralComponentModel + FemoralComponentSize + 
##     FemoralComponentType + TibialTrayModel + TibialTraySize + 
##     TibialInsertModel + TibialInsertWidth + TibialInsertType + 
##     PatellaModel + PatellaDiameter
##   Res.Df    RSS Df Sum of Sq      F Pr(>F)
## 1   1470 297676                           
## 2   1408 285974 62     11702 0.9293 0.6326

The evaluation doesn’t show much improvement and anova rejects the significance of the smaller model.

We will attempt to utilize AIC and BIC approaches and evaluate the results.

surgeon_model_back_aic = step(surgeon_full_additive_model, direction = "backward", trace = 0)
evaluate(surgeon_model_back_aic)
##                   Test       Result
## 1  Shapiro Wilk pvalue 5.744359e-05
## 2 Breusch-Pagan pvalue  0.003167732
## 3            R Squared     0.207701
## 4        Adj R Squared    0.1958997
## 5     Cross Validation     14.33432
## 6     Large Hat Values           85
## 7          Influential           69

anova(surgeon_model_back_aic, surgeon_full_additive_model)
## Analysis of Variance Table
## 
## Model 1: KneeScore_Surgeon ~ Year + Age + Weight + BMI + Side + KneeScore_Patient + 
##     FemoralComponentModel + PatellaDiameter
## Model 2: KneeScore_Surgeon ~ Surgeon + Year + Age + Gender + Weight + 
##     Height + BMI + Diagnosis + Race + Side + KneeScore_Patient + 
##     GPS + Manufacturer + FemoralComponentModel + FemoralComponentSize + 
##     FemoralComponentType + TibialTrayModel + TibialTraySize + 
##     TibialInsertModel + TibialInsertWidth + TibialInsertType + 
##     PatellaModel + PatellaDiameter
##   Res.Df    RSS Df Sum of Sq      F Pr(>F)
## 1   1477 298938                           
## 2   1408 285974 69     12964 0.9251 0.6511

Removing the Influential observations and running AIC again-

clean_dataset_aic = removeInfluential(surgeon_model_back_aic, train_data)
surgeon_full_additive_model_clean = lm(KneeScore_Surgeon ~ ., data = clean_dataset_aic)
surgeon_model_aic_clean = step(surgeon_full_additive_model_clean, direction = "backward", trace = 0)
evaluate(surgeon_model_aic_clean)
##                   Test       Result
## 1  Shapiro Wilk pvalue  0.002316926
## 2 Breusch-Pagan pvalue 0.0005649064
## 3            R Squared    0.2316975
## 4        Adj R Squared    0.2246488
## 5     Cross Validation     13.05352
## 6     Large Hat Values           28
## 7          Influential           59

anova(surgeon_model_aic_clean, surgeon_full_additive_model_clean)
## Analysis of Variance Table
## 
## Model 1: KneeScore_Surgeon ~ Year + Age + Height + BMI + Side + KneeScore_Patient + 
##     TibialTraySize
## Model 2: KneeScore_Surgeon ~ Surgeon + Year + Age + Gender + Weight + 
##     Height + BMI + Diagnosis + Race + Side + KneeScore_Patient + 
##     GPS + Manufacturer + FemoralComponentModel + FemoralComponentSize + 
##     FemoralComponentType + TibialTrayModel + TibialTraySize + 
##     TibialInsertModel + TibialInsertWidth + TibialInsertType + 
##     PatellaModel + PatellaDiameter
##   Res.Df    RSS Df Sum of Sq      F Pr(>F)
## 1   1417 239496                           
## 2   1341 228175 76     11321 0.8755 0.7678

Utilize a Bayesian Information Criterion approach and evaluate the selected model.

n = length(resid(surgeon_full_additive_model))
surgeon_model_back_bic = step(surgeon_full_additive_model, direction = "backward", k = log(n), trace = 0)
evaluate(surgeon_model_back_bic)
##                   Test       Result
## 1  Shapiro Wilk pvalue 3.533832e-05
## 2 Breusch-Pagan pvalue   0.02174066
## 3            R Squared    0.1779156
## 4        Adj R Squared     0.175716
## 5     Cross Validation      14.4313
## 6     Large Hat Values           68
## 7          Influential           92

coef(surgeon_model_back_bic)
##       (Intercept)               Age          SideLeft         SideRight 
##        12.3236748         0.2819257         8.7281485         8.5845741 
## KneeScore_Patient 
##         0.2521117
anova(surgeon_model_back_bic, surgeon_full_additive_model)
## Analysis of Variance Table
## 
## Model 1: KneeScore_Surgeon ~ Age + Side + KneeScore_Patient
## Model 2: KneeScore_Surgeon ~ Surgeon + Year + Age + Gender + Weight + 
##     Height + BMI + Diagnosis + Race + Side + KneeScore_Patient + 
##     GPS + Manufacturer + FemoralComponentModel + FemoralComponentSize + 
##     FemoralComponentType + TibialTrayModel + TibialTraySize + 
##     TibialInsertModel + TibialInsertWidth + TibialInsertType + 
##     PatellaModel + PatellaDiameter
##   Res.Df    RSS Df Sum of Sq      F  Pr(>F)  
## 1   1495 310177                              
## 2   1408 285974 87     24202 1.3697 0.01561 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The resulting BIC model scores marginally better in Cross Validation though the anova results clearly prefer the full additive model.

Lets try the interaction between the predictors which we got in the model selected by Backward BIC.

surgeon_model_back_bic_int = lm(KneeScore_Surgeon ~ Age * Side * KneeScore_Patient, data = train_data)
evaluate(surgeon_model_back_bic_int)
##                   Test       Result
## 1  Shapiro Wilk pvalue 5.860884e-06
## 2 Breusch-Pagan pvalue  0.001950516
## 3            R Squared    0.2038934
## 4        Adj R Squared    0.1980082
## 5     Cross Validation     14.29154
## 6     Large Hat Values          138
## 7          Influential           94

anova(surgeon_model_back_bic_int, surgeon_full_additive_model)
## Analysis of Variance Table
## 
## Model 1: KneeScore_Surgeon ~ Age * Side * KneeScore_Patient
## Model 2: KneeScore_Surgeon ~ Surgeon + Year + Age + Gender + Weight + 
##     Height + BMI + Diagnosis + Race + Side + KneeScore_Patient + 
##     GPS + Manufacturer + FemoralComponentModel + FemoralComponentSize + 
##     FemoralComponentType + TibialTrayModel + TibialTraySize + 
##     TibialInsertModel + TibialInsertWidth + TibialInsertType + 
##     PatellaModel + PatellaDiameter
##   Res.Df    RSS Df Sum of Sq      F Pr(>F)
## 1   1488 300375                           
## 2   1408 285974 80     14401 0.8863 0.7519
coef(surgeon_model_back_bic_int)
##                     (Intercept)                             Age 
##                    -16.82893122                      0.85626307 
##                        SideLeft                       SideRight 
##                     35.34068923                     23.94000851 
##               KneeScore_Patient                    Age:SideLeft 
##                      0.98984321                     -0.58092659 
##                   Age:SideRight           Age:KneeScore_Patient 
##                     -0.43222169                     -0.01384533 
##      SideLeft:KneeScore_Patient     SideRight:KneeScore_Patient 
##                     -0.67082908                     -0.61485672 
##  Age:SideLeft:KneeScore_Patient Age:SideRight:KneeScore_Patient 
##                      0.01374840                      0.01316275

Perhaps removing some influential data points will assist in creating a better model. Next we remove influential points that satisfy the criteria for Large Cook's Distance values and evaluating BIC again.

clean_dataset = removeInfluential(surgeon_model_back_bic, train_data)
surgeon_full_additive_model_clean = lm(KneeScore_Surgeon ~ ., data = clean_dataset)
surgeon_model_back_bic_clean = step(surgeon_full_additive_model_clean, direction = "backward", k = log(n), trace = 0)

evaluate(surgeon_model_back_bic_clean)
##                   Test       Result
## 1  Shapiro Wilk pvalue 0.0003205851
## 2 Breusch-Pagan pvalue 0.0001228871
## 3            R Squared    0.2288343
## 4        Adj R Squared    0.2266357
## 5     Cross Validation      12.4029
## 6     Large Hat Values           73
## 7          Influential           53

anova(surgeon_model_back_bic_clean, surgeon_full_additive_model_clean)
## Analysis of Variance Table
## 
## Model 1: KneeScore_Surgeon ~ Age + Side + KneeScore_Patient
## Model 2: KneeScore_Surgeon ~ Surgeon + Year + Age + Gender + Weight + 
##     Height + BMI + Diagnosis + Race + Side + KneeScore_Patient + 
##     GPS + Manufacturer + FemoralComponentModel + FemoralComponentSize + 
##     FemoralComponentType + TibialTrayModel + TibialTraySize + 
##     TibialInsertModel + TibialInsertWidth + TibialInsertType + 
##     PatellaModel + PatellaDiameter
##   Res.Df    RSS Df Sum of Sq      F  Pr(>F)  
## 1   1403 215164                              
## 2   1316 198050 87     17114 1.3071 0.03408 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

How about interaction on clean dataset-

surgeon_model_back_bic_int_clean = lm(KneeScore_Surgeon ~ Age * Side * KneeScore_Patient, data = clean_dataset)
evaluate(surgeon_model_back_bic_int_clean)
##                   Test      Result
## 1  Shapiro Wilk pvalue 0.000194294
## 2 Breusch-Pagan pvalue 0.002528189
## 3            R Squared   0.2459378
## 4        Adj R Squared    0.239996
## 5     Cross Validation    12.30977
## 6     Large Hat Values         131
## 7          Influential          44

comparing BIC and Interaction Models

anova(surgeon_model_back_bic_clean, surgeon_model_back_bic_int_clean)
## Analysis of Variance Table
## 
## Model 1: KneeScore_Surgeon ~ Age + Side + KneeScore_Patient
## Model 2: KneeScore_Surgeon ~ Age * Side * KneeScore_Patient
##   Res.Df    RSS Df Sum of Sq      F    Pr(>F)    
## 1   1403 215164                                  
## 2   1396 210392  7    4772.1 4.5234 5.321e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Based on Anova test we can reject the surgeon_model_back_bic_clean model and prefer surgeon_model_back_bic_int_clean.

Next, we will evaluate the RMSE of the best performing models thus far.

Model Train RMSE Test RMSE
Model All+ 13.8075825 14.1531654
Selective Model 14.0872534 14.0449973
Model AIC 14.1170897 14.5745219
Model BIC 14.3799985 14.1825336
Model Int 14.1509713 14.131628
Model Clean All+ 14.0419197 14.2632453
Model Clean AIC 14.3040352 14.1389152
Model Clean BIC 14.3884783 14.1955392
Model Clean Int 14.2282859 14.1009845

Interaction model gives the BEST RMSE also low Cross Validation and High \(Adjusted R^2\), so we will consider this as the best model to predict the KneeScore_Surgeon.

Knee Score Patient

Begin by fitting a full additive model as a baseline and evaluating.

pat_full_additive_model = lm(KneeScore_Patient ~ ., data = knee)
evaluate(pat_full_additive_model)
##                   Test       Result
## 1  Shapiro Wilk pvalue 1.177599e-13
## 2 Breusch-Pagan pvalue     0.257962
## 3            R Squared     0.296855
## 4        Adj R Squared     0.261377
## 5     Cross Validation     17.00694
## 6     Large Hat Values          204
## 7          Influential           NA

The baseline model has many large leverage values, which, upon inspection may be coming from several of the implant fields. Perhaps removing the suspect implant predictors will help improve the model.

pat_model_eight = lm(KneeScore_Patient ~ . - Diagnosis - Race - TibialInsertType - FemoralComponentModel - PatellaModel - TibialTrayModel - TibialTraySize - FemoralComponentType, data = knee)
evaluate(pat_model_eight)
##                   Test       Result
## 1  Shapiro Wilk pvalue 2.572268e-14
## 2 Breusch-Pagan pvalue 0.0001090169
## 3            R Squared    0.2776021
## 4        Adj R Squared    0.2567126
## 5     Cross Validation     16.92618
## 6     Large Hat Values          125
## 7          Influential           NA

anova(pat_model_eight, pat_full_additive_model)
## Analysis of Variance Table
## 
## Model 1: KneeScore_Patient ~ (Surgeon + Year + Age + Gender + Weight + 
##     Height + BMI + Diagnosis + Race + Side + KneeScore_Surgeon + 
##     GPS + Manufacturer + FemoralComponentModel + FemoralComponentSize + 
##     FemoralComponentType + TibialTrayModel + TibialTraySize + 
##     TibialInsertModel + TibialInsertWidth + TibialInsertType + 
##     PatellaModel + PatellaDiameter) - Diagnosis - Race - TibialInsertType - 
##     FemoralComponentModel - PatellaModel - TibialTrayModel - 
##     TibialTraySize - FemoralComponentType
## Model 2: KneeScore_Patient ~ Surgeon + Year + Age + Gender + Weight + 
##     Height + BMI + Diagnosis + Race + Side + KneeScore_Surgeon + 
##     GPS + Manufacturer + FemoralComponentModel + FemoralComponentSize + 
##     FemoralComponentType + TibialTrayModel + TibialTraySize + 
##     TibialInsertModel + TibialInsertWidth + TibialInsertType + 
##     PatellaModel + PatellaDiameter
##   Res.Df    RSS Df Sum of Sq     F Pr(>F)  
## 1   1902 532327                            
## 2   1863 518140 39     14187 1.308 0.0973 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

While removing the predictors doesn’t yield a definitively better model at a reasonable \(\alpha\), it does increase adjusted \(R^2\) and slightly improves the cross validation score.

Utilize a Bayesian Information Criterion approach and evaluate the selected model.

n = length(resid(pat_full_additive_model))
pat_model_back_bic = step(pat_full_additive_model, direction = "backward", k = log(n), trace = 0)
evaluate(pat_model_back_bic)
##                   Test       Result
## 1  Shapiro Wilk pvalue 2.427597e-15
## 2 Breusch-Pagan pvalue 7.235823e-07
## 3            R Squared    0.2424413
## 4        Adj R Squared    0.2405008
## 5     Cross Validation     16.93997
## 6     Large Hat Values          112
## 7          Influential          148

anova(pat_model_back_bic, pat_full_additive_model)
## Analysis of Variance Table
## 
## Model 1: KneeScore_Patient ~ Age + Gender + Weight + Height + KneeScore_Surgeon
## Model 2: KneeScore_Patient ~ Surgeon + Year + Age + Gender + Weight + 
##     Height + BMI + Diagnosis + Race + Side + KneeScore_Surgeon + 
##     GPS + Manufacturer + FemoralComponentModel + FemoralComponentSize + 
##     FemoralComponentType + TibialTrayModel + TibialTraySize + 
##     TibialInsertModel + TibialInsertWidth + TibialInsertType + 
##     PatellaModel + PatellaDiameter
##   Res.Df    RSS Df Sum of Sq      F    Pr(>F)    
## 1   1952 558236                                  
## 2   1863 518140 89     40097 1.6199 0.0003004 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The resulting BIC model scores marginally better in cross validation though the Anova results clearly prefer the full additive model.

Next, we will evaluate the RMSE of the best performing models thus far.

options(warn=-1)
sample_idx = sample(1:nrow(knee), 1500)
train_data = knee[sample_idx, ]
test_data = knee[-sample_idx, ]

rmse  = function(actual, predicted) {  sqrt(mean((actual - predicted) ^ 2))}

pat_full_test = rmse(test_data$KneeScore_Patient, predict(pat_full_additive_model, test_data))
pat_full_train = rmse(train_data$KneeScore_Patient, predict(pat_full_additive_model, train_data))

pat_test_mod_8 = rmse(test_data$KneeScore_Patient, predict(pat_model_eight, test_data))
pat_train_mod_8 = rmse(train_data$KneeScore_Patient, predict(pat_model_eight, train_data))

pat_bic_test = rmse(test_data$KneeScore_Patient, predict(pat_model_back_bic, test_data))
pat_bic_train = rmse(train_data$KneeScore_Patient, predict(pat_model_back_bic, train_data))
Model Train RMSE Test RMSE
Model All+ 13.8075825 14.1531654
Model 8 16.5291748 16.3548583
Model BIC 16.9360603 16.7169659

The additive model with all predictors has the best RMSE values.

Perhaps removing some influential data points will assist in creating a better model. Next we remove influential points that satisfy the criteria for Large Leverage, Large Cook's Distance and Large RStandard values.

new_dataframe = removeInfluential(pat_model_eight, knee)
pat_model_nine = lm(KneeScore_Patient ~ . - Diagnosis - Race - TibialInsertType - FemoralComponentModel - PatellaModel - TibialTrayModel - TibialTraySize - FemoralComponentType, data = new_dataframe)
evaluate(pat_model_nine)
##                   Test       Result
## 1  Shapiro Wilk pvalue 1.475761e-09
## 2 Breusch-Pagan pvalue  0.001662082
## 3            R Squared    0.3104628
## 4        Adj R Squared    0.2893819
## 5     Cross Validation     15.11494
## 6     Large Hat Values           97
## 7          Influential           NA

This does seem to be of moderate benefit in relation to adjusted \(R^2\) and cross validation.

We will evaluate the same model once again after removing the identified large hat values.

remove_hat_dataframe = removeLargeHatValues(pat_model_nine, new_dataframe)
pat_model_ten = lm(KneeScore_Patient ~ . - Diagnosis - Race - TibialInsertType - FemoralComponentModel - PatellaModel - TibialTrayModel - TibialTraySize - FemoralComponentType, data = remove_hat_dataframe)
evaluate(pat_model_ten)
##                   Test       Result
## 1  Shapiro Wilk pvalue 3.905128e-08
## 2 Breusch-Pagan pvalue  0.002867139
## 3            R Squared    0.3047602
## 4        Adj R Squared    0.2893913
## 5     Cross Validation     15.40933
## 6     Large Hat Values           96
## 7          Influential           94

Removing the large hat values decreased the cross validation score. Perhaps we can find a better model by running BIC on the modified dataframe.

n = length(resid(pat_model_ten))
pat_model_10_bic = step(pat_model_ten, direction = "backward", k = log(n), trace = 0)
evaluate(pat_model_10_bic)
##                   Test       Result
## 1  Shapiro Wilk pvalue 9.345871e-09
## 2 Breusch-Pagan pvalue  0.001236141
## 3            R Squared    0.2778953
## 4        Adj R Squared    0.2758345
## 5     Cross Validation     15.44057
## 6     Large Hat Values           92
## 7          Influential          124

Despite the reduction in leveraged values the selected model is no better than preceding models.

sample_idx_two = sample(1:nrow(remove_hat_dataframe), 1300)
train_data_two = remove_hat_dataframe[sample_idx_two, ]
test_data_two = remove_hat_dataframe[-sample_idx_two, ]

test_mod_10 = rmse(test_data_two$KneeScore_Patient, predict(pat_model_ten, train_data_two))
train_mod_10 = rmse(train_data_two$KneeScore_Patient, predict(pat_model_ten, train_data_two))

test_10_bic = rmse(test_data_two$KneeScore_Patient, predict(pat_model_10_bic, train_data_two))
train_10_bic = rmse(train_data_two$KneeScore_Patient, predict(pat_model_10_bic, train_data_two))
Model Train RMSE Test RMSE
Model 10 15.103835 19.9564655
Model 10 BIC 15.4178179 19.8407548

Interestingly, again the models fit on the dataset without leverages fares worse on test RMSE.

Conclusion

  1. Does the satisfactory score is dependent on different age groups?
  2. Score improvement of GPS vs manual placement?
  3. Does score depends on the gender?
  4. Does GPS improved score based on surgeons?
  5. Are devices from various vendors more suitable for GPS placement or not?
  6. Satisfacory scores depends on BMI, H, W?
  7. Are some vendors more suitable various BMI, H, W?
  8. Is there any dependencies between scores and various replacemnt components?
  9. What components seems to matter the most on the score?
  10. Is there any relationship between components from various vendors and race?
  11. Based on race and gender which has the higher score?

Appendix

# Warning: takes hours to fit
all_two_way_model = lm(KneeScore_Surgeon ~ .*., data = knee)
# Warning...
n = length(resid(all_two_way_model))
model_one_bic_all_two_way = step(all_two_way_model, direction = 'backward', k = log(n), trace = 0)
model_one_pat = lm(log(KneeScore_Patient) ~ ., data = knee)
model_one_surg = lm(log(KneeScore_Surgoen) ~ ., data = knee)
model_two_exp_pat= lm(KneeScore_Patient^2 ~ ., data = knee)
model_two_exp_surg = lm(KneeScore_Surgeon^2 ~ ., data = knee)
# replace all 0 values with the mean of the vector
temp = knee
temp$KneeScore_Patient[temp$KneeScore_Patient == 0] = mean(temp$KneeScore_Patient)

model_three = lm(KneeScore_Patient ~ ., data = temp)
evaluate(model_five)
clean_temp_data = removeLargeHatValues(model_three, temp)
model_four = lm(KneeScore_Patient ~ ., data = clean_temp_data)
evaluate(model_four)
pat_model = lm(KneeScore_Patient ~. , data = knee)
pat_model_back_aic = step(pat_model, direction = "backward", trace = 0)
pat_model_for_aic = step(pat_model, direction = "forward", trace = 0)
pat_model_both_aic = step(pat_model, direction = "both", trace = 0)

n = length(resid(pat_model))
pat_model_back_bic = step(pat_model, direction = "backward", k = log(n), trace = 0)
pat_model_for_bic = step(pat_model, direction = "forward", k = log(n), trace = 0)
pat_model_both_bic = step(pat_model, direction = "both", k = log(n), trace = 0)
evaluate(pat_model)
evaluate(pat_model_back_aic)
evaluate(pat_model_for_aic)
evaluate(pat_model_both_aic)
## Adj.R.Squared: 0.24 - CrossValidation: 16.9
evaluate(pat_model_back_bic)
evaluate(pat_model_for_bic)
evaluate(pat_model_both_bic)
## Remove large leverages
new_data = removeInfluential(pat_model, knee)
pat_model_without_large_leverages = lm(KneeScore_Patient ~ ., data = new_data)
evaluate(pat_model_without_large_leverages)

Removing the records with large hat values and refitting simply introduces more large hat values. What is unique about these rows of data?

which(hatvalues(pat_model) == 1.0)
print(knee[17,])
print(knee[240,])
print(knee[638,])
print(knee[1093,])
print(knee[1775, ])
pat_model_nine = lm(KneeScore_Patient ~ . - TibialInsertType - FemoralComponentModel - PatellaModel - TibialTrayModel - FemoralComponentType, data = knee)
evaluate(pat_model_nine)

References